0%

(ICCV 2017) Learning in an Uncertain World:Representing Ambiguity Through Multiple Hypotheses

Rupprecht C, Laina I, DiPietro R, et al. Learning in an uncertain world: Representing ambiguity through multiple hypotheses[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 3591-3600.



1. Overview


1.1. Motivation

  • uncertainty arises from the way data is labeled (label of occluded joints in pose)

In this paper

  • reformulate existing single-prediction modles as multiple hypothesis prediction (MHP) modles (meta loss)
  • MHP can expose valuable insights
  • outperform SHP
  • experiments on
    • human pose estimation
    • future frame prediction
    • classification (multi-label)
    • segmentation
  • Multiple Choice Learning
  • Multi-label Recognition



2. Methods


2.1. SHP



2.2. MHP

2.2.1. Meta Loss Function



  • M. the number of predictions (replicate output layer M times with differ init)
  • Delta. True for 1, False for 0
  • f_j. the j-th predictor among M predictors

Meta loss function can be regarded as original loss function weighted by Deltal

2.2.2. Procedure

  • create M predictors, then forward each sample
  • build y_i(x)
  • compute gradient and update

2.2.3. Relax Delta



  • solve the problem: predictor may be initialized so far from the target labels y that all y lie in a single Voronoi cell k
  • Additionally, drop out predictions with some low probability (1%) to introduce some randomness in the selection of the best hypothesis, such that weaker predictions will not vanish during training

2.2.4. Hyper-parameter M

  • almost every method that models posterior probabilities needs some form of hand-tuned model parameter (k-means, MDNs)



3. Experiments


3.1. Pose



  • SHP. 59.7%
  • 2-MHP. 60.0%
  • 5-MHP. 61.2%
  • 10-MHP. 62.8%

  • with increasing number of predictions the method is able to model the output space more and more-precisely

3.2. Future Frame Predictions




3.3. Classification

  • if an image contains two bikes and a person, every time the image is sampled during training it will be labeled either as bike or person with 50%


3.4. Segmentation

  • MHP (70.3%) vs MCL (69.1%)